DiGrad: Multi-Task Reinforcement Learning with Shared Actions

نویسندگان

  • Parijat Dewangan
  • S. Phani Teja
  • K. Madhava Krishna
  • Abhishek Sarkar
  • Balaraman Ravindran
چکیده

Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Shared Representations for Value Functions in Multi-task Reinforcement Learning

We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space. Since the environment does not change, there is potentially a lot of common ground amongst tasks and learning to solve them individually seems extremely wasteful. In this paper, we explicitly model and learn this...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Reinforcement Learning with Reusing Mechanism of Avoidance Actions and its Application to Learning Whole-Body Motions of Multi-Link Robot∗

In acquiring a motion only from its objective by learning, large cost such as damage from falling over and a large number of trials are required if the motion is a complex one, such as a jumping serve. Reusing the knowledge already learnt is an essential mechanism to learn such motions efficiently, like humans do. In this study, we propose to use a decomposition of action-value functions as a r...

متن کامل

Argumentation Accelerated Reinforcement Learning for Cooperative Multi-Agent Systems

Multi-Agent Learning is a complex problem, especially in real-time systems. We address this problem by introducing Argumentation Accelerated Reinforcement Learning (AARL), which provides a methodology for defining heuristics, represented by arguments, and incorporates these heuristics into Reinforcement Learning (RL) by using reward shaping. We define AARL via argumentation and prove that it ca...

متن کامل

Multiagent Credit Assignment in a Team of Cooperative Q-Learning Agents with a Parallel Task

Traditionally in many multiagent reinforcement learning researches, qualifying each individual agent’s behavior is responsibility of environment’s critic. However, in most practical cases, critic is not completely aware of effects of all agents’ actions on the team performance. Using agents’ learning history, it is possible to judge the correctness of their actions. To do so, we use team common...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.10463  شماره 

صفحات  -

تاریخ انتشار 2018